Using the Markov Chain Monte Carlo method to make inferences on items of data contaminated by missing values

نویسندگان

  • I. Karangwa
  • D. Kotze
چکیده

The Markov Chain Monte Carlo (MCMC) is a method that is used to estimate parameters of interest under difficult conditions such as missing data or when underlying distributions do not fit the assumptions of Maximum Likelihood processes. The objective of this process is to find a probability distribution known as a posterior distribution in Bayesian analysis that can be used to estimate target parameters. In this paper, we consider a case where data are contaminated with missing values and therefore need to be adequately handled using missing data techniques before making inferences on them. A review of the mathematics involved in MCMC procedures in the presence of missing data is presented. Furthermore, we use real data to compare inferences made using multiple imputation based on the multivariate normal model (MVN) that uses the MCMC procedure, the case deletion (CD) missing data method that discards subjects with missing values from the analysis, and the fully conditional specification (FCS) multiple imputation method that uses a sequence of regression models to fill in missing values. Assuming that data are missing completely at random (MCAR) on continuous and normally distributed variables, the following findings are obtained: (1) The higher the proportion of missing data on a variable of interest, the more the relationship between that variable and the dependent variable is distorted when all missing data methods are applied. (2) Multiple imputation based methods produce similar estimates which are better than estimates from the case deletion method. (3) At some stage (when the proportion of missing data becomes high), none of the missing data techniques can help to maintain an initially existing relationship between the dependent variable and some of the covariates of interest in the dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inference about the Burr Type III Distribution under Type-II Hybrid Censored Data

This paper presents the statistical inference on the parameters of the Burr type III distribution, when the data are Type-II hybrid censored. The maximum likelihood estimators are developed for the unknown parameters using the EM algorithm method. We provided the observed Fisher information matrix using the missing information principle which is useful for constructing the asymptotic confidence...

متن کامل

Markov Chain Monte Carlo Multiple Imputation for Incomplete ITS Data Using Bayesian Networks

The rich ITS data is a precious resource for transportation researchers and practitioners. However, the usability of such resource is greatly limited by the issue of data missing. A lot of imputation methods have been proposed in the past decade. However, some issues are still not or not sufficiently addressed. For example, the missing of entire records, temporal correlation in observations, na...

متن کامل

Spatial count models on the number of unhealthy days in Tehran

Spatial count data is usually found in most sciences such as environmental science, meteorology, geology and medicine. Spatial generalized linear models based on poisson (poisson-lognormal spatial model) and binomial (binomial-logitnormal spatial model) distributions are often used to analyze discrete count data in which spatial correlation is observed. The likelihood function of these models i...

متن کامل

MCMC for hidden continuous - time

Hidden Markov models have proved to be a very exible class of models, with many and diverse applications. Recently Markov chain Monte Carlo (MCMC) techniques have provided powerful computational tools to make inferences about the parameters of hidden Markov models, and about the unobserved Markov chain, when the chain is deened in discrete time. We present a general algorithm, based on reversib...

متن کامل

Probabilistic analysis of stability of chain pillars in Tabas coal mine in Iran using Monte Carlo simulation

Performing a probabilistic study rather than a determinist one is a relatively easy way to quantify the uncertainty in an engineering design. Due to the complexity and poor accuracy of the statistical moment methods, the Monte Carlo simulation (MCS) method is wildly used in an engineering design. In this work, an MCS-based reliability analysis was carried out for the stability of the chain pill...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013